Search Results: "Ian Jackson"

2 September 2021

Ian Jackson: partial-borrow: references to restricted views of a Rust struct

tl;dr:
With these two crazy proc-macros you can hand out multipe (perhaps mutable) references to suitable subsets/views of the same struct. Why In Otter I have adopted a style where I try to avoid giving code mutable access that doesn't need it, and try to make mutable access come with some code structures to prevent "oh I forgot a thing" type mistakes. For example, mutable access to a game state is only available in contexts that have to return a value for the updates to send to the players. This makes it harder to forget to send the update. But there is a downside. The game state is inside another struct, an Instance, and much code needs (immutable) access to it. I can't pass both &Instance and &mut GameState because one is inside the other. My workaround involves passing separate references to the other fields of Instance, leading to some functions taking far too many arguments. 14 in one case. (They're all different types so argument ordering mistakes just result in compiler errors talking about arguments 9 and 11 having wrong types, rather than actual bugs.) I felt this problem was purely a restriction arising from limitations of the borrow checker. I thought it might be possible to improve on it. Weeks passed and the question gradually wormed its way into my consciousness. Eventually, I tried some experiments. Encouraged, I persisted. What and how partial-borrow is a Rust library which solves this problem. You sprinkle #[Derive(PartialBorrow)] and partial!(...) and then you can pass a reference which grants mutable access to only some of the fields. You can also pass a reference through which some fields are inaccessible. You can even split a single mut reference into multiple compatible references, for example granting mut access to mutually-nonverlapping subsets. The core type is Struct__Partial (for some Struct). It is a zero-sized type, but we prevent anyone from constructing one. Instead we magic up references to it, always ensuring that they have the same address as some Struct. The fields of Struct__Partial are also ZSTs that exist ony as references, and they Deref to the actual field (subject to compile-type borrow compatibility checking). Soundness and testing partial-borrow is primarily a nontrivial procedural macro which autogenerates reams of unsafe. Of course I think it's sound, but I thought that the last two times before I added a test which demonstrated otherwise. So it might be fairer to say that I have tried to make it sound and that I don't know of any problems... Reasoning about the correctness of macro-generated code is not so easy. One problem is that there is nowhere good to put the kind of soundness arguments you would normally add near uses of unsafe. I decided to solve this by annotating an instance of the macro output. There's a not very complicated script using diff3 to help fold in changes if the macro output changes - merge conflicts there mean a possible re-review of the argument text. Of course I also have test cases that run with miri, and test cases for expected compiler errors for uses that need to be forbidden for soundness. But this is quite hairy and I'm worried that it might be rather "my first insane unsafe contraption". Also the pointer/reference trickery is definitely subtle, and depends heavily on knowing what Rust's aliasing and pointer provenance rules really are. Stacked Borrows is not entirely trivial to reason about in fiddly corner cases. So for now I have only called it 0.1.0 and left a note in the docs. I haven't actually made Otter use it yet but that's the rather more boring software integration part, not the fun "can I do this mad thing" part so I will probably leave that for a rainy day. Possibly a rainy day after someone other than me has looked at partial-borrow (preferably someone who understands Stacked Borrows...). Fun! This was great fun. I even enjoyed writing the docs. The proc-macro programming environment is not entirely straightforward and there are a number of things to watch out for. For my first non-adhoc proc-macro this was, perhaps, ambitious. But you don't learn anything without trying...
edited 2021-09-02 16:28 UTC to fix a typo


comment count unavailable comments

17 August 2021

Ian Jackson: Releasing nailing-cargo 1.0.0

Summary I have just tagged nailing-cargo/1.0.0. nailing-cargo is a wrapper around the Rust build tool cargo. nailing-cargo can: Background and history It's not really possible to make a nontrivial Rust project without using cargo. But the build process automatically downloads and executes code from crates.io, which is a minimally-curated repository. I didn't want to expose my main account to that. And, at the time, I was working on a project which for which I was also writing a library as a dependency, and I found that cargo couldn't cope with this unless I were to commit (to my git repository) the path (on my local laptop) of my dependency. I filed some bugs, including about the unpublished crate problem. But also, I was stubborn enough to try to find a workaround that didn't involve committing junk to my git history. The result was a short but horrific shell script. I wrote about this at the time (March 2019). Over the last few years the difficulties I have with cargo have remained un-resolved. I found my interactions with upstream rather discouraging. It didn't seem like I would get anywhere by trying to help improve cargo to better support my needs. So instead I have gradually improved nailing-cargo. It is now a Perl script. It is rather less horrific, and has proper documentation (sorry, JS needed because GitLab). Why Perl ? Rust would have been my language of choice. But I wanted to avoid a chicken-and-egg situation. When you're doing privsep, nailing-cargo has to run in your more privileged environment. I wanted something easy to get going with. nailing-cargo has to contain a TOML parser; and I found a small one, TOML-Tiny, which was good enough as a starting point, and small enough I could bundle it as a git subtree. Perl is nicely fast to start up (nailing-cargo --- true runs in about 170ms on my laptop), and it is easy to write a Perl script that will work on pretty much any Perl installation. Still unsolved: embedding cargo in another build system A number of my projects contain a mixture of Rust code with other languages. Unfortunately, nailing-cargo doesn't help with the problems which arise trying to integrate cargo into another build system. I generally resort to find runes for finding Rust source files that might influence cargo, and stamp files for seeing if I have run it recently enough; and I simply live with the fact that cargo sometimes builds more stuff than I needed it to. Future There are a number of ways nailing-cargo could be improved. Notably, the need to overwrite your actual Cargo.toml is very annoying, even if nailing-cargo puts it back afterwards. A big problem with this is that it means that nailing-cargo has to take a lock, while your cargo rune runs. This effectively prevents using nailing-cargo with long-running processes. Notably, editor integrations like rls and racer. I could perhaps solve this with more linkfarm-juggling, but that wouldn't help in-tree builds and it's hard to keep things up to date. I am considering using LD_PRELOAD trickery or maybe bwrap(1) to "implement" the alternative Cargo.toml feature which was rejected by cargo upstream in 2019 (and again in April when someone else asked). Currently there is no support for using sudo for out-of-tree privsep. This should be easy to add but it needs someone who uses sudo to want it (and to test it!) The documentation has some other dicusssion of limitations, some of which aren't too hard to improve. Patches welcome!

comment count unavailable comments

26 May 2021

Ian Jackson: Disconnecting from Freenode

I have just disconnected from irc.freenode.net for the last time. You should do the same. The awful new de facto operators are using user numbers as a public justification for their behaviour. Specifically, I recommend that you: Note that mentioning libera in the channel topic of your old channels on freenode is likely to get your channel forcibly taken over by the new de facto operators of freenode. They won't tolerate you officially directing people to the competition. I did an investigation and writeup of this situation for the Xen Project. It's a little out of date - it doesn't have the latest horrible behaviours from the new regime - but I think it is worth pasting it here:
Message-ID: <24741.12566.639691.461134@mariner.uk.xensource.com>
From: Ian Jackson <iwj@xenproject.org>
To: xen-devel@lists.xenproject.org
CC: community.manager@xenproject.org
Subject: IRC networks
Date: Wed, 19 May 2021 16:39:02 +0100
Summary:
We have for many years used the Freenode IRC network for real-time
chat about Xen.  Unfortunately, Freenode is undergoing a crisis.
There is a dispute between, on the one hand, Andrew Lee, and on the
other hand, all (or almost all) Freenode volunteer staff.  We must
make a decision.
I have read all the publicly available materials and asked around with
my contacts.  My conclusions:
 * We do not want to continue to use irc.freenode.*.
 * We might want to use libera.chat, but:
 * Our best option is probably to move to OFTC https://www.oftc.net/
Discussion:
Firstly, my starting point.
I have been on IRC since at least 1993.  Currently my main public
networks are OFTC and Freenode.
I do not have any personal involvement with public IRC networks.  Of
the principals in the current Freenode dispute, I have only heard of
one, who is a person I have experience of in a Debian context but have
not worked closely with.
George asked me informally to use my knowledge and contacts to shed
light on the situation.  I decided that, having done my research, I
would report more formally and publicly here rather than just
informally to George.
Historical background:
 * Freenode has had drama before.  In about 2001 OFTC split off from
   Freenode after an argument over governance.  IIRC there was drama
   again in 2006.  Significant proportion of the Free Software world,
   including Debian, now use OFTC.  Debian switched in 2006.
Facts that I'm (now) pretty sure of:
 * Freenode's actual servers run on donated services; that is,
   the hardware is owned by those donating the services, and the
   systems are managed by Freenode volunteers, known as "staff".
 * The freenode domain names are currently registered to a limited
   liability company owned by Andrew Lee (rasengan).
 * At least 10 Freenode staff have quit in protest, writing similar
   resignation letters protesting about Andrew Lee's actions [1].  It
   does not appear that any Andrew Lee has the public support of any
   Freenode staff.
 * Andrew Lee claims that he "owns" Freenode.[2]
 * A large number of channel owners for particular Free Software
   projects who previously used Freenode have said they will switch
   away from Freenode.
Discussion and findings on Freenode:
There is, as might be expected, some murk about who said what to whom
when, what promises were made and/or broken, and so on.  The matter
was also complicated by the leaking earlier this week of draft(s) of
(at least one of) the Freenode staffers' resignation letters.
Andrew Lee has put forward a position statement [2].  A large part of
the thrust of that statement is allegations that the current head of
Freenode staff, tomaw, "forced out" the previous head, christel.  This
allegation is strongly disputed by by all those current (resigning)
Freenode staff I have seen comment.  In any case it does not seem to
be particularly germane; in none of my reading did tomaw seem to be
playing any kind of leading role.  tomaw is not mentioned in the
resignation letters.
Some of the links led to me to logs of discussions on #freenode.  I
read some of these in particular[3].  MB I haven't been able to verify
that these logs have not been tampered with.  Having said that and
taking the logs at face value, I found the rasengan writing there to
be disingenuous and obtuse.
Andrew Lee has been heavily involved in Bitcoin.  Bitcoin is a hive of
scum and villainy, a pyramid scheme, and an environmental disaster,
all rolled into one.  This does not make me think well of Lee.
Additionally, it seems that Andrew Lee has been involved in previous
governance drama involving a different IRC network, Snoonet.
I have come to the very firm conclusion that we should have nothing to
do with Andrew Lee, and avoid using services that he has some
effective control over.
Alternatives:
The departing Freenode staff are setting up a replacement,
"libera.chat".  This is operational but still suffering from teething
problems and of course has a significant load as it deals with an
influx of users on a new setup.
On the staff and trust question: As I say, I haven't heard of any of
the Freenode staff, with one exception.  Unfortunately the one
exception does not inspire confidence in me[4] - although NB that is
only one data point.
On the other hand, Debian has had many many years of drama-free
involvement with OFTC.  OFTC has a formal governance arrangement and
it is associated with Software in the Public Interest.  I notice that
the last few OFTC'[s annual officer elections have been run partly by
Steve McIntyre.  Steve is a friend of mine (and he is a former Debian
Project Leader) and I take his involvement as a good sign.
I recommend that we switch to using OFTC as soon as possible.
Ian.
References:
Starting point for the resigning Freenode staff's side [1]:
  https://gist.github.com/joepie91/df80d8d36cd9d1bde46ba018af497409
Andrew Lee's side [2]:
  https://gist.github.com/realrasengan/88549ec34ee32d01629354e4075d2d48
[3]
https://paste.sr.ht/~ircwright/7e751d2162e4eb27cba25f6f8893c1f38930f7c4
[4] I won't give the name since I don't want to be shitposting.


comment count unavailable comments

23 May 2021

Ian Jackson: Otter game server - now with uploadable game bundles

In April I wrote about releasing Otter, which has been one of my main personal projects during the pandemic. Uploadable game materials Otter comes with playing cards and a chess set, and some ancillary bits and bobs. Until now, if you wanted to play with something else, the only way was to read some rather frightening build instructions to add your pieces to Otter itself, or to dump a complicated structure of extra files into the server install. Now that I have released Otter 0.6.0, you can upload a zipfile to the server. The format of the zipfile is even documented! Otter development - I still love Rust Working on Otter has been great fun, partly because it has involved learning lots of new stuff, and partly because it's mosty written in Rust. I love Rust; one of my favourite aspects is the way it's possible to design a program so that most of your mistakes become compile errors. This means you spend more time dealing with compiler errors and less time peering at a debugger trying to figure out why it has gone wrong. Future plans - help wanted! So far, Otter has been mostly a one-person project. I would love to have some help. There are two main areas where I think improvement is very important: If you think you could help with these, and playing around with a game server sounds fun, do get in touch. For now, next on my todo list is to provide a nicely cooked git-forge-style ssh transport facility, so that you don't need a shell account on the server but can run the otter command line client tool locally.

comment count unavailable comments

19 April 2021

Ian Jackson: Otter - a game server for arbitrary board games

One of the things that I found most vexing about lockdown was that I was unable to play some of my favourite board games. There are online systems for many games, but not all. And online systems cannot support games like Mao where the players make up the rules as we go along. I had an idea for how to solve this problem, and set about implementing it. The result is Otter (the Online Table Top Environment Renderer). We have played a number of fun games of Penultima with it, and have recently branched out into Mao. The Otter is now ready to be released! More about Otter (cribbed shamelessly from the README) Otter, the Online Table Top Environment Renderer, is an online game system. But it is not like most online game systems. It does not know (nor does it need to know) the rules of the game you are playing. Instead, it lets you and your friends play with common tabletop/boardgame elements such as hands of cards, boards, and so on. So it s something like a tabletop simulator (but it does not have any 3D, or a physics engine, or anything like that). This means that with Otter: Installation and usage Otter is fully functional, but the installation and account management arrangements are rather unsophisticated and un-webby. And there is not currently any publicly available instance you can use to try it out. Users on chiark will find an instance there. Other people who who are interested in hosting games (of Penultima or Mao, or other games we might support) will have to find a Unix host or VM to install Otter on, and will probably want help from a Unix sysadmin. Otter is distributed via git, and is available on Salsa, Debian's gitlab instance. There is documentation online. Future plans I have a number of ideas for improvement, which go off in many different directions. Quite high up on my priority list is making it possible for players to upload and share game materials (cards, boards, pieces, and so on), rather than just using the ones which are bundled with Otter itself (or dumping files ad-hoc on the server). This will make it much easier to play new games. One big reason I wrote Otter is that I wanted to liberate boardgame players from the need to implemet their game rules as computer code. The game management and account management is currently done with a command line tool. It would be lovely to improve that, but making a fully-featured management web ui would be a lot of work. Screenshots! (Click for the full size images.)

comment count unavailable comments

15 April 2021

Ian Jackson: Dreamwidth blocking many RSS readers and aggregators

There is a serious problem with Dreamwidth, which is impeding access for many RSS reader tools. This started at around 0500 UTC on Wednesday morning, according to my own RSS reader cron job. A friend found #43443 in the DW ticket tracker, where a user of a minority web browser found they were blocked. Local tests demonstrated that Dreamwidth had applied blocking by the HTTP User-Agent header, and were rejecting all user-agents not specifically permitted. Today, this rule has been relaxed and unknown user-agents are permitted. But user-agents for general http client libraries are still blocked. I'm aware of three unresolved tickets about this: #43444 #43445 #43447 We're told there by a volunteer member of Dreamwidth's support staff that this has been done deliberately for "blocking automated traffic". I'm sure the volunteer is just relaying what they've been told by whoever is struggling to deal with what I suppose is probably a spam problem. But it's still rather unsatisfactory. I have suggested in my own ticket that a good solution might be to apply the new block only to posting and commenting (eg, maybe, by applying it only to HTTP POST requests). If the problem is indeed spam then that ought to be good enough, and would still let RSS readers work properly. I'm told that this new blocking has been done by "implementing" (actually, configuring or enabling) "some AWS rules for blocking automated traffic". I don't know what facilities AWS provides. This kind of helplessness is of course precisely the kind of thing that the Free Software movement is against and precisely the kind of thing that proprietary services like AWS produce. I don't know if this blog entry will appear on planet.debian.org and on other people's reader's and aggregators. I think it will at least be seen by other Dreamwidth users. I thought I would post here in the hope that other Dreamwidth users might be able to help get this fixed. At the very least other Dreamwidth blog owners need to know that many of their readers may not be seeing their posts at all. If this problem is not fixed I will have to move my blog. One of the main points of having a blog is publishing it via RSS. RSS readers are of course based on general http client libraries and many if not most RSS readers have not bothered to customise their user-agent. Those are currently blocked.

comment count unavailable comments

23 March 2021

Ian Jackson: Signing the open letter about RMS

I have signed the open letter calling for RMS to be removed from all leadership positions, including the GNU Project, and for the whole FSF board to step down. Here is what I wrote in my commit message:
    I published my first Free Software in 1989, under the GNU General
    Public Licence.  I remain committed to the ideal of software freedom,
    for everyone.
    
    When the CSAIL list incident blew up I was horrified to read the
    stories of RMS's victims.  We have collectively failed those people
    and I am glad to see that many of us are working to make a better
    community.
    
    I have watched with horror as RMS has presided over, and condoned,
    astonishingly horrible behaviour in many GNU Project discussion
    venues.
    
    The Free Software Foundation Board are doing real harm by reinstating
    an abuser.  I had hoped for and expected better from them.
    
    RMS's vision and ideals of software freedom have been inspiring to me.
    But his past behaviour and current attitudes mean he must not and can
    not be in a leadership position in the Free Software community.
Comments are disabled. Edited 2021-03-23T21:44Z to fix a typo in the commit message.


comment count unavailable comments

8 November 2020

Ian Jackson: Gazebo out of scaffolding

Today we completed our gazebo, which we designed and built out of scaffolding: Picture of gazebo Scaffolding is fairly expensive but building things out of it is enormous fun! You can see a complete sequence of the build process, including pictures of the "engineering maquette", at https://www.chiark.greenend.org.uk/~ijackson/2020/scaffold/ Post-lockdown maybe I will build a climbing wall or something out of it...
edited 2020-11-08 20:44Z to fix img url following hosting reorg


comment count unavailable comments

2 October 2020

Ian Jackson: Mailman vs DKIM - a novel solution

tl;dr: Do not configure Mailman to replace the mail domains in From: headers. Instead, try out my small new program which can make your Mailman transparent, so that DKIM signatures survive. Background and narrative DKIM NB: This explanation is going to be somewhat simplified. I am going to gloss over some details and make some slightly approximate statements. DKIM is a new anti-spoofing mechanism for Internet email, intended to help fight spam. DKIM, paired with the DMARC policy system, has been remarkably successful at stemming the flood of joe-job spams. As usually deployed, DKIM works like this: When a message is originally sent, the author's MUA sends it to the MTA for their From: domain for outward delivery. The From: domain mailserver calculates a cryptographic signature of the message, and puts the signature in the headers of the message. Obviously not the whole message can be signed, since at the very least additional headers need to be added in transit, and sometimes headers need to be modified too. The signing MTA gets to decide what parts of the message are covered by the signature: they nominate the header fields that are covered by the signature, and specify how to handle the body. A recipient MTA looks up the public key for the From: domain in the DNS, and checks the signature. If the signature doesn't match, depending on policy (originator's policy, in the DNS, and recipient's policy of course), typically the message will be treated as spam. The originating site has a lot of control over what happens in practice. They get to publish a formal (DMARC) policy in the DNS which advises recipients what they should do with mails claiming to be from their site. As mentioned, they can say which headers are covered by the signature - including the ability to sign the absence of a particular headers - so they can control which headers downstreams can get away with adding or modifying. And they can set a normalisation policy, which controls how precisely the message must match the one that they sent. Mailman Mailman is, of course, the extremely popular mailing list manager. There are a lot of things to like about it. I choose to run it myself not just because it's popular but also because it provides a relatively competent web UI and a relatively competent email (un)subscription interfaces, decent bounce handling, and a pretty good set of moderation and posting access controls. The Xen Project mailing lists also run on mailman. Recently we had some difficulties with messages sent by Citrix staff (including myself), to Xen mailing lists, being treated as spam. Recipient mail systems were saying the DKIM signatures were invalid. This was in fact true. Citrix has chosen a fairly strict DKIM policy; in particular, they have chosen "simple" normalisation - meaning that signed message headers must precisely match in syntax as well as in a semantic sense. Examining the the failing-DKIM messages showed that this was definitely a factor. Applying my Opinions about email My Bayesian priors tend to suggest that a mail problem involving corporate email is the fault of the corporate email. However in this case that doesn't seem true to me. My starting point is that I think mail systems should not not modify messages unnecessarily. None of the DKIM-breaking modifications made by Mailman seemed necessary to me. I have on previous occasions gone to corporate IT and requested quite firmly that things I felt were broken should be changed. But it seemed wrong to go to corporate IT and ask them to change their published DKIM/DMARC policy to accomodate a behaviour in Mailman which I didn't agree with myself. I felt that instead I shoud put (with my Xen Project hat on) my own house in order. Getting Mailman not to modify messages So, I needed our Mailman to stop modifying the headers. I needed it to not even reformat them. A brief look at the source code to Mailman showed that this was not going to be so easy. Mailman has a lot of features whose very purpose is to modify messages. Personally, as I say, I don't much like these features. I think the subject line tags, CC list manipulations, and so on, are a nuisance and not really Proper. But they are definitely part of why Mailman has become so popular and I can definitely see why the Mailman authors have done things this way. But these features mean Mailman has to disassemble incoming messages, and then reassemble them again on output. It is very difficult to do that and still faithfully reassemble the original headers byte-for-byte in the case where nothing actually wanted to modify them. There are existing bug reports[1] [2] [3] [4]; I can see why they are still open. Rejected approach: From:-mangling This situation is hardly unique to the Xen lists. Many other have struggled with it. The best that seems to have been come up with so far is to turn on a new Mailman feature which rewrites the From: header of the messages that go through it, to contain the list's domain name instead of the originator's. I think this is really pretty nasty. It breaks normal use of email, such as reply-to-author. It is having Mailman do additional mangling of the message in order to solve the problems caused by other undesirable manglings! Solution! As you can see, I asked myself: I want Mailman not modify messages at all; how can I get it to do that? Given the existing structure of Mailman - with a lot of message-modifying functionality - that would really mean adding a bypass mode. It would have to spot, presumably depending on config settings, that messages were not to be edited; and then, it would avoid disassembling and reassembling the message at at all, and bypass the message modification stages. The message would still have to be parsed of course - it's just that the copy send out ought to be pretty much the incoming message. When I put it to myself like that I had a thought: couldn't I implement this outside Mailman? What if I took a copy of every incoming message, and then post-process Mailman's output to restore the original? It turns out that this is quite easy and works rather well! outflank-mailman outflank-mailman is a 233-line script, plus documentation, installation instructions, etc. It is designed to run from your MTA, on all messages going into, and coming from, Mailman. On input, it saves a copy of the message in a sqlite database, and leaves a note in a new Outflank-Mailman-Id header. On output, it does some checks, finds the original message, and then combines the original incoming message with carefully-selected headers from the version that Mailman decided should be sent. This was deployed for the Xen Project lists on Tuesday morning and it seems to be working well so far. If you administer Mailman lists, and fancy some new software to address this problem, please do try it out. Matters arising - Mail filtering, DKIM Overall I think DKIM is a helpful contribution to the fight against spam (unlike SPF, which is fundamentally misdirected and also broken). Spam is an extremely serious problem; most receiving mail servers experience more attempts to deliver spam than real mail, by orders of magnitude. But DKIM is not without downsides. Inherent in the design of anything like DKIM is that arbitrary modification of messages by list servers is no longer possible. In principle it might be possible to design a system which tolerated modifications reasonable for mailing lists but it would be quite complicated and have to somehow not tolerate similar modifications in other contexts. So DKIM means that lists can no longer add those unsubscribe footers to mailing list messages. The "new way" (RFC2369, July 1998), to do this is with the List-Unsubscribe header. Hopefully a good MUA will be able to deal with unsubscription semiautomatically, and I think by now an adequate MUA should at least display these headers by default. Sender: There are implications for recipient-side filtering too. The "traditional" correct way to spot mailing list mail was to look for Resent-To:, which can be added without breaking DKIM; the "new" (RFC2919, March 2001) correct way is List-Id:, likewise fine. But during the initial deployment of outflank-mailman I discovered that many subscribers were detecting that a message was list traffic by looking at the Sender: header. I'm told that some mail systems (apparently Microsoft's included) make it inconvenient to filter on List-Id. Really, I think a mailing list ought not to be modifying Sender:. Given Sender:'s original definition and semantics, there might well be reasonable reasons for a mailing list posting to have different From: and and then the original Sender: ought not to be lost. And a mailing list's operation does not fit well into the original definition of Sender:. I suspect that list software likes to put in Sender mostly for historical reasons; notably, a long time ago it was not uncommon for broken mail systems to send bounces to the Sender: header rather than the envelope sender (SMTP MAIL FROM). DKIM makes this more of a problem. Unfortunately the DKIM specifications are vague about what headers one should sign, but they pretty much definitely include Sender: if it is present, and some materials encourage signing the absence of Sender:. The latter is Exim's default configuration when DKIM-signing is enabled. Franky there seems little excuse for systems to not readily support and encourage filtering on List-Id, 20 years later, but I don't want to make life hard for my users. For now we are running a compromise configuration: if there wasn't a Sender: in the original, take Mailman's added one. This will result in (i) misfiltering for some messages whose poster put in a Sender:, and (ii) DKIM failures for messages whose originating system signed the absence of a Sender:. I'm going to mine the db for some stats after it's been deployed for a week or so, to see which of these problems is worst and decide what to do about it. Mail routing For DKIM to work, messages being sent From: a particular mail domain must go through a system trusted by that domain, so they can be signed. Most users tend to do this anyway: their mail provider gives them an IMAP server and an authenticated SMTP submission server, and they configure those details in their MUA. The MUA has a notion of "accounts" and according to the user's selection for an outgoing message, connects to the authenticated submission service (usually using TLS over the global internet). Trad unix systems where messages are sent using the local sendmail or localhost SMTP submission (perhaps by automated systems, or perhaps by human users) are fine too. The smarthost can do the DKIM signing. But this solution is awkward for a user of a trad MUA in what I'll call "alias account" setups: where a user has an address at a mail domain belonging to different people to the system on which they run their MUA (perhaps even several such aliases for different hats). Traditionally this worked by the mail domain forwarding incoming the mail, and the user simply self-declaring their identity at the alias domain. Without DKIM there is nothing stopping anyone self-declaring their own From: line. If DKIM is to be enabled for such a user (preventing people forging mail as that user), the user will have to somehow arrange that their trad unix MUA's outbound mail stream goes via their mail alias provider. For a single-user sending unix system this can be done with tolerably complex configuration in an MTA like Exim. For shared systems this gets more awkward and might require some hairy shell scripting etc.
edited 2020-10-01 21:22 and 21:35 and -02 10:50 +0100 to fix typos and 21:28 to linkify "my small program" in the tl;dr


comment count unavailable comments

17 August 2020

Ian Jackson: Doctrinal obstructiveness in Free Software

Any software system has underlying design principles, and any software project has process rules. But I seem to be seeing more often, a pathological pattern where abstract and shakily-grounded broad principles, and even contrived and sophistic objections, are used to block sensible changes. Today I will go through an example in detail, before ending with a plea: PostgreSQL query planner, WITH [MATERIALIZED] optimisation fence Background history PostgreSQL has a sophisticated query planner which usually gets the right answer. For good reasons, the pgsql project has resisted providing lots of knobs to control query planning. But there are a few ways to influence the query planner, for when the programmer knows more than the planner. One of these is the use of a WITH common table expression. In pgsql versions prior to 12, the planner would first make a plan for the WITH clause; and then, it would make a plan for the second half, counting the WITH clause's likely output as a given. So WITH acts as an "optimisation fence". This was documented in the manual - not entirely clearly, but a careful reading of the docs reveals this behaviour:
The WITH query will generally be evaluated as written, without suppression of rows that the parent query might discard afterwards.
Users (authors of applications which use PostgreSQL) have been using this technique for a long time. New behaviour in PostgreSQL 12 In PostgreSQL 12 upstream were able to make the query planner more sophisticated. In particular, it is now often capable of looking "into" the WITH common table expression. Much of the time this will make things better and faster. But if WITH was being used for its side-effect as an optimisation fence, this change will break things: queries that ran very quickly in earlier versions might now run very slowly. Helpfully, pgsql 12 still has a way to specify an optimisation fence: specifying WITH ... AS MATERIALIZED in the query. So far so good. Upgrade path for existing users of WITH fence But what about the upgrade path for existing users of the WITH fence behaviour? Such users will have to update their queries to add AS MATERIALIZED. This is a small change. Having to update a query like this is part of routine software maintenance and not in itself very objectionable. However, this change cannnot be made in advance because pgsql versions prior to 12 will reject the new syntax. So the users are in a bit of a bind. The old query syntax can be unuseably slow with the new database and the new syntax is rejected by the old database. Upgrading both the database and the application, in lockstep, is a flag day upgrade, which every good sysadmin will want to avoid. A solution to this problem Colin Watson proposed a very simple solution: make the earlier PostgreSQL versions accept the new MATERIALIZED syntax. This is correct since the new syntax specifies precisely the actual behaviour of the old databases. It has no deleterious effect on any users of older pgsql versions. It makes it possible to add the new syntax to the application, before doing the database upgrade, decoupling the two upgrades. Colin Watson even provided an implementation of this proposal. The solution is rejected by upstream Unfortunately upstream did not accept this idea. You can read the whole thread yourself if you like. But in summary, the objections were (italic indicates literal quotes): I find these extremely unconvincing, even taken together. Many of them are very unattractive things to hear one's upstream saying. At best they are knee-jerk and inflexible application of very general principles. The authors of these objections seem to have lost sight of the fact that these principles have a purpose. When these kind of software principles work against their purposes, they should be revised, or exceptions made. At worst, it looks like a collective effort to find reasons - any reasons, no matter how bad - not to make this change. The OFFSET 0 trick One of the responses in the thread mentions OFFSET 0. As part of writing the queries in the Xen Project CI system, and preparing for our system upgrade, I had carefully read the relevant pgsql documentation. This OFFSET 0 trick was new to me. But, now that I know the answer, it is easy to provide the right search terms and find, for example, this answer on stackmumble. Apparently adding a no-op OFFSET 0 to the subquery defeats the pgsql 12 query planner's ability to see into the subquery.
I think OFFSET 0 is the better approach since it's more obviously a hack showing that something weird is going on, and it's unlikely we'll ever change the optimiser behaviour around OFFSET 0 ... wheras hopefully CTEs will become inlineable at some point CTEs became inlineable by default in PostgreSQL 12.
So in fact there is a syntax for an optimisation fence that is accepted by both earlier and later PostgreSQL versions. It's even recommended by pgsql devs. It's just not documented, and is described by pgsql developers as a "hack". Astonishingly, the fact that it is a "hack" is given as a reason to use it! Well, I have therefore deployed this "hack". No doubt it will stay in our codebase indefinitely. Please don't be like that! I could come up with a lot more examples of other projects that have exhibited similar arrogance. It is becoming a plague! But every example is contentious, and I don't really feel I need to annoy a dozen separate Free Software communities. So I won't make a laundry list of obstructiveness. If you are an upstream software developer, or a distributor of software to users (eg, a distro maintainer), you have a lot of practical power. In theory it is Free Software so your users could just change it themselves. But for a user or downstream, carrying a patch is often an unsustainable amount of work and risk. Most of us have patches we would love to be running, but which we haven't even written because simply running a nonstandard build is too difficult, no matter how technically excellent our delta. As an upstream, it is very easy to get into a mindset of defending your code's existing behaviour, and to turn your project's guidelines into inflexible rules. Constant exposure to users who make silly mistakes, and rudely ask for absurd changes, can lead to core project members feeling embattled. But there is no need for an upstream to feel embattled! You have the vast majority of the power over the software, and over your project communication fora. Use that power consciously, for good. I can't say that arrogance will hurt you in the short term. Users of software with obstructive upstreams do not have many good immediate options. But we do have longer-term choices: we can choose which software to use, and we can choose whether to try to help improve the software we use. After reading Colin's experience, I am less likely to try to help improve the experience of other PostgreSQL users by contributing upstream. It doesn't seem like there would be any point. Indeed, instead of helping the PostgreSQL community I am now using them as an example of bad practice. I'm only half sorry about that.

comment count unavailable comments

2 August 2020

Holger Levsen: 20200802-debconf4

DebConf4 This tshirt is 16 years old and from DebConf4. Again, I should probably wash it at 60 celcius for once... DebConf4 was my 2nd DebConf and took place in Porto Alegre, Brasil. Like many DebConfs, it was a great opportunity to meet people: I remember sitting in the lobby of the venue and some guy asked me what I did in Debian and I told him about my little involvements and then asked him what he was doing, and he told me he wanted to become involved in Debian again, after getting distracted away. His name was Ian Murdock... DebConf4 also had a very cool history session in the hallway track (IIRC, but see below) with Bdale Garbee, Ian Jackson and Ian Murdock and with a young student named Biella Coleman busy writing notes. That same hallway also saw the kickoff meeting of the Debian Women project, though sadly today http://tinc.debian.net ("there's no cabal") only shows an apache placeholder page and not a picture of that meeting. DebCon4 was also the first time I got a bit involved in preparing DebConf, together with Jonas Smedegaard I've set up some computers there, using FAI. I had no idea that this was the start of me contributing to DebConfs for text ten years. And of course I also saw some talks, including one which I really liked, which then in turn made me notice there were no people doing video recordings, which then lead to something... I missed the group picture of this one. I guess it's important to me to mention it because I've met very wonderful people at this DebConf... (some mentioned in this post, some not. You know who you are!) Afterwards some people stayed in Porto Alegre for FISL, where we saw Lawrence Lessing present Creative Commons to the world for the first time. On the flight back I sat next to a very friendly guy from Poland and we talked almost the whole flight and then we never saw each other again, until 15 years later in Asia... Oh, and then, after DebConf4, I used IRC for the first time. And stayed in the #debconf4 IRC channel for quite some years :) Finally, DebConf4 and more importantly FISL, which was really big (5000 people?) and after that, the wizard of OS conference in Berlin (which had a very nice talk about Linux in different places in the world, illustrating the different states of 'first they ignore you, then they laugh at you, then they fight you, then you win'), made me quit my job at a company supporting Windows- and Linux-setups as I realized I'd better start freelancing with Linux-only jobs. So, once again, my life would have been different if I would not have attended these events! Note: yesterdays post about DebConf3 was thankfully corrected twice. This might well happen to this post too! :)

14 July 2020

Ian Jackson: MessagePack vs CBOR (RFC7049)

tl;dr: Use MessagePack, rather than CBOR. Introduction I recently wanted to choose a binary encoding. This was for a project using Rust serde, so I looked at the list of formats there. I ended up reading about CBOR and MessagePack. Both of these are binary formats for a JSON-like data model. Both of them are "schemaless", meaning you can decode them without knowing the structure. (This also provides some forwards compatibility.) They are, in fact, quite similar (although they are totally incompatible). This is no accident: CBOR is, effectively, a fork of MessagePack. Both formats continue to exist and both are being used in new programs. I needed to make a choice but lacked enough information. I thought I would try to examine the reasons and nature of the split, and to make some kind of judgement about the situation. So I did a lot of reading [11]. Here are my conclusions. History and politics Between about 2010 and 2013 there was only MessagePack. Unfortunately, MessagePack had some problems. The biggest of these was that it lacked a separate string type. Strings were to be encoded simply as byte blocks. This caused serious problems for many MessagePack library implementors: for example, when decoding a MessagePack file the Python library wouldn't know whether to produce a Python bytes object, or a string. Straightforward data structures wouldn't round trip through MessagePack. [1] [2] It seems that in late 2012 this came to the attention to someone with an IETF background. According to them, after unsatisfactory conversations with MessagePack upstream, they decided they would have to fork. They submitted an Internet Draft for a partially-incompatible protocol [3] [4]. Little seemed to happen in the IETF until soon before the Orlando in-person IETF meeting in February 2013.[5] These conversations sparked some discussion in the MessagePack issue tracker. There were long threads including about process [1,2,4 ibid]. But there was also a useful technical discussion, about proposed backward compatible improves to the MessagePack spec.[5] The prominent IETF contributor provided some helpful input in these discussions in the MessagePack community - but also pushed quite hard for a "tagging" system, which suggestion was not accepted (see my technical analysis, below). An improved MessagePack spec resulted, with string support, developed largely by the MessagePack community. It seems to have been available in useable form since mid-2013 and was officially published as canonical in August 2013. Meanwhile a parallel process was pursued in the IETF, based on the IETF contributor's fork, with 11 Internet-Drafts from February[7] to September[8]. This seems to have continued even though the original technical reason for the fork - lack of string vs binary distinction - no longer applied. The IETF proponent expressed unhappiness about MessagePack's stewardship and process as much as they did about the technical details [4, ibid]. The IETF process culminated in the CBOR RFC[9]. The discussion on process questions between the IETF proponent and MessagePack upstream, in the MessagePack issue tracker [4, ibid] should make uncomfortable reading for IETF members. The IETF acceptance of CBOR despite clear and fundamental objections from MessagePack upstream[13] and indeed other respected IETF members[14], does not reflect well on the IETF. The much vaunted openness of the IETF process seems to have been rather one-sided. The IETF proponent here was an IETF Chair. Certainly the CBOR author was very well-spoken and constantly talks about politeness and cooperation and process; but what they actually did was very hostile. They accused the MessagePack community of an "us and them" attitude while simultaneously pursuing a forked specification! The CBOR RFC does mention MessagePack in Appendix E.2. But not to acknowledge that CBOR was inspired by MessagePack. Rather, it does so to make a set of tendentious criticisms of MessagePack. Perhaps these criticisms were true when they were first written in an I-D but they were certainly false by the time the RFC was actually published, which occurred after the MessagePack improvement process was completely concluded, with a formal spec issued. Since then both formats have existed in parallel. Occasionally people discuss which one is better, and sometimes it is alleged that "yes CBOR is the successor to MessagePack", which is not really fair.[9][10] Technical differences The two formats have a similar arrangement: initial byte which can encode small integers, or type and length, or type and specify a longer length encoding. But there are important differences. Overall, MessagePack is very significantly simpler. Floating point CBOR supports five floating point formats! Not only three sizes of IEEE754, but also decimal floating point, and bigfloats. This seems astonishing for a supposedly-simple format. (Some of these are supported via the semi-optional tag mechanism - see below.) Indefinite strings and arrays Like MessagePack, CBOR mostly precedes items with their length. But CBOR also supports "indefinite" strings, arrays, and so on, where the length is not specified at the beginning. The object (array, string, whatever) is terminated by a special "break" item. This seems to me to be a mistake. If you wanted the kind of application where MessagePack or CBOR would be useful, streaming sub-objects of unknown length is not that important. This possibility considerably complicates decoders. CBOR tagging system CBOR has a second layer of sort-of-type which can be attached to each data item. The set of possible tags is open-ended and extensible, but the CBOR spec itself gives tag values for: two kinds of date format; positive and negative bignums; decimal floats (see above); binary but expected to be encoded if converted to JSON (in base64url, base64, or base16); nestedly encoded CBOR; URIs; base64 data (two formats); regexps; MIME messages; and a special tag to make file(1) work. In practice it is not clear how many of these are used, but a decoder must be prepared to at least discard them. The amount of additional spec complexity here is quite astonishing. IMO binary formats like this will (just like JSON) be used by a next layer which always has an idea of what the data means, including (where the data is a binary blob) what encoding it is in etc. So these tags are not useful. These tags might look like a middle way between (i) extending the binary protocol with a whole new type such as an extension type (incompatible with old readers) and encoding your new kind data in a existing type (leaving all readers who don't know the schema to print it as just integers or bytes or string). But I think they are more trouble than they are worth. The tags are uncomfortably similar to the ASN.1 tag system, which is widely regarded as one of ASN.1's unfortunate complexities. MessagePack extension mechanism MessagePack explicitly reserves some encoding space for users and for future extensions: there is an "extension type". The payload is an extension type byte plus some more data bytes; the data bytes are in a format to be defined by the extension type byte. Half of the possible extension byte values are reserved for future specification, and half are designated for application use. This is pleasingly straightforward. (There is also one unused primary initial byte value, but that would be rejected by existing decoders and doesn't seem like a likely direction for future expansion.) Minor other differences in integer encoding The encodings of integers differ. In MessagePack, signed and unsigned integers have different typecodes. In CBOR, signed and unsigned positive integers have the same typecodes; negative integers have a different set of typecodes. This means that a CBOR reader which knows it is expecting a signed value will have to do a top-bit-set check on the actual data value! And a CBOR writer must check the value to choose a typecode. MessagePack reserves fewer shortcodes for small negative integers, than for small positive integers. Conclusions and lessons MessagePack seems to have been prompted into fixing the missing string type problem, but only by the threat of a fork. However, this fork went ahead even after MessagePack clearly accepted the need for a string type. MessagePack had a fixed protocol spec before the IETF did. The continued pursuit of the IETF fork was ostensibly been motivated by a disapproval of the development process and in particular a sense that the IETF process was superior. However, it seems to me that the IETF process was abused by CBOR's proponent, who just wanted things their own way. I have seen claims by IETF proponents that the open decisionmaking system inherently produces superior results. However, in this case the IETF process produced a bad specification. To the extent that other IETF contributors had influence over the ultimate CBOR RFC, I don't think they significantly improved it. CBOR has been described as MessagePack bikeshedded by the IETF. That would have been bad enough, but I think it's worse than that. To a large extent CBOR is one person's NIH-induced bad design rubber stamped by the IETF. CBOR's problems are not simply matters of taste: it's significantly overcomplicated. One lesson for the rest of us is that although being the upstream and nominally in charge of a project seems to give us a lot of power, it's wise to listen carefully to one's users and downstreams. Once people are annoyed enough to fork, the fork will have a life of its own. Another lesson is that many of us should be much warier of the supposed moral authority of the IETF. Many IETF standards are awful (Oauth 2 [12]; IKE; DNSSEC; the list goes on). Sometimes (especially when network adoption effects are weak, as with MessagePack vs CBOR) better results can be obtained from a smaller group, or even an individual, who simply need the thing for their own uses. Finally, governance systems of public institutions like the IETF need to be robust in defending the interests of outsiders (and hence of society at large) against eloquent insiders who know how to work the process machinery. Any institution which nominally serves the public good faces a constant risk of devolving into self-servingness. This risk gets worse the more powerful and respected the institution becomes. References
  1. #13: First-class string type in serialization specification (MessagePack issue tracker, June 2010 - August 2013)
  2. #121: Msgpack can't differentiate between raw binary data and text strings (MessagePack issue tracker, November 2012 - February 2013)
  3. draft-bormann-apparea-bpack-00: The binarypack JSON-like representation format (IETF Internet-Draft, October 2012)
  4. #129: MessagePack should be developed in an open process (MessagePack issue tracker, February 2013 - March 2013)
  5. Re: JSON mailing list and BoF (IETF apps-discuss mailing list message from Carsten Bormann, 18 February 2013)
  6. #128: Discussions on the upcoming MessagePack spec that adds the string type to the protocol (MessagePack issue tracker, February 2013 - August 2013)
  7. draft-bormann-apparea-bpack-01: The binarypack JSON-like representation format (IETF Internet-Draft, February 2013)
  8. draft-bormann-cbor: Concise Binary Object Representation (CBOR) (IETF Internet-Drafts, May 2013 - September 2013)
  9. RFC 7049: Concise Binary Object Representation (CBOR) (October 2013)
  10. "MessagePack should be replaced with [CBOR] everywhere ..." (floatboth on Hacker News, 8th April 2017)
  11. Discussion with very useful set of history links (camgunz on Hacker News, 9th April 2017)
  12. OAuth 2.0 and the Road to Hell (Eran Hammer, blog posting from 2012, via Wayback Machine)
  13. Re: [apps-discuss] [Json] msgpack/binarypack (Re: JSON mailing list and BoF) (IETF list message from Sadyuki Furuhashi, 4th March 2013)
  14. "no apologies for complaining about this farce" (IETF list message from Phillip Hallam-Baker, 15th August 2013)
    Edited 2020-07-14 18:55 to fix a minor formatting issue, and 2020-07-14 22:54 to fix two typos


comment count unavailable comments

24 June 2020

Ian Jackson: Renaming the primary git branch to "trunk"

I have been convinced by the arguments that it's not nice to keep using the word master for the default git branch. Regardless of the etymology (which is unclear), some people say they have negative associations for this word, Changing this upstream in git is complicated on a technical level and, sadly, contested. But git is flexible enough that I can make this change in my own repositories. Doing so is not even so difficult. So: Announcement I intend to rename master to trunk in all repositories owned by my personal hat. To avoid making things very complicated for myself I will just delete refs/heads/master when I make this change. So there may be a little disruption to downstreams. I intend make this change everywhere eventually. But rather than front-loading the effort, I'm going to do this to repositories as I come across them anyway. That will allow me to update all the docs references, any automation, etc., at a point when I have those things in mind anyway. Also, doing it this way will allow me to focus my effort on the most active projects, and avoids me committing to a sudden large pile of fiddly clerical work. But: if you have an interest in any repository in particular that you want updated, please let me know so I can prioritise it. Bikeshed Why "trunk"? "Main" has been suggested elswewhere, and it is often a good replacement for "master" (for example, we can talk very sensibly about a disk's Main Boot Record, MBR). But "main" isn't quite right for the VCS case; for example a "main" branch ought to have better quality than is typical for the primary development branch. Conversely, there is much precedent for "trunk". "Trunk" was used to refer to this concept by at least SVN, CVS, RCS and CSSC (and therefore probably SCCS) - at least in the documentation, although in some of these cases the command line API didn't have a name for it. So "trunk" it is. Aside: two other words - passlist, blocklist People are (finally!) starting to replace "blacklist" and "whitelist". Seriously, why has it taken everyone this long? I have been using "blocklist" and "passlist" for these concepts for some time. They are drop-in replacements. I have also heard "allowlist" and "denylist" suggested, but they are cumbersome and cacophonous. Also "allow" and "deny" seem to more strongly imply an access control function than merely "pass" and "block", and the usefulness of passlists and blocklists extends well beyond access control: protocol compatibility and ABI filtering are a couple of other use cases.

comment count unavailable comments

18 June 2020

Ian Jackson: BountySource have turned evil - alternatives ?

I need an alternative to BountySource, who have done an evil thing. Please post recommendations in the comments.
From: Ian Jackson <*****>
To: support@bountysource.com
Subject: Re: Update to our Terms of Service
Date: Wed, 17 Jun 2020 16:26:46 +0100
Bountysource writes ("Update to our Terms of Service"):
> You are receiving this email because we are updating the Bountysource Terms of
> Service, effective 1st July 2020.
>
> What's changing?
> We have added a Time-Out clause to the Bounties section of the agreement:
>
> 2.13 Bounty Time-Out.
> If no Solution is accepted within two years after a Bounty is posted, then the
> Bounty will be withdrawn and the amount posted for the Bounty will be retained
> by Bountysource. For Bounties posted before June 30, 2018, the Backer may
> redeploy their Bounty to a new Issue by contacting support@bountysource.com
> before July 1, 2020. If the Backer does not redeploy their Bounty by the
> deadline, the Bounty will be withdrawn and the amount posted for the Bounty
> will be retained by Bountysource.
>
> You can read the full Terms of Service here
>
> What do I need to do?
> If you agree to the new terms, you don't have to do anything.
>
> If you have a bounty posted prior to June 30, 2018 that is not currently being
> solved, email us at support@bountysource.com to redeploy your bounty.  Or, if
> you do not agree with the new terms, please discontinue using Bountysource.
I do not agree to this change to the Terms and Conditions.
Accordingly, I will not post any more bounties on BountySource.
I currently have one outstanding bounty of $200 on
   https://www.bountysource.com/issues/86138921-rfe-add-a-frontend-for-the-rust-programming-language
That was posted in December 2019.  It is not clear now whether that
bounty will be claimed within your 2-year timeout period.
Since I have not accepted the T&C change, please can you confirm that
(i) My bounty will not be retained by BountySource even if no solution
    is accepted by December 2021.
(ii) As a backer, you will permit me to vote on acceptance of that
    bounty should a solution be proposed before then.
I suspect that you intend to rely on the term in the previous T&C
giving you unlimited ability to modify the terms and conditions.  Of
course such a term is an unfair contract term, because if it were
effective it would give you the power to do whatever you like.  So it
is not binding on me.
I look forward to hearing from you by the 8th of July.  If I do not
hear from you I will take the matter up with my credit card company.
Thank you for your attention.
Ian.
They will try to say "oh it's all governed by US law" but of course section 75 of the Consumer Credit Act makes the card company jointly liable for Bountysource's breach of contract and a UK court will apply UK consumer protection law even to a contract which says it is to be governed by US law - because you can't contract out of consumer protection. So the card company are on the hook and I can use them as a lever. Update - BountySource have changed their mind
From: Bountysource <support@bountysource.com>
To: *****
Subject: Re: Update to our Terms of Service
Date: Wed, 17 Jun 2020 18:51:11 -0700
Hi Ian
The new terms of service has with withdrawn.
This is not the end of the matter, I'm sure. They will want to get long-unclaimed bounties off their books (and having the cash sat forever at BountySource is not ideal for backers either). Hopefully they can engage in a dialogue and find a way that is fair, and that doesn't incentivise BountySource to sabotage bounty claims(!) I think that means that whatever it is, BountySource mustn't keep the money. There are established ways of dealing with similar problems (eg ancient charitable trusts; unclaimed bank accounts). I remain wary. That BountySource is now owned by cryptocurrency company is not encouraging. That they would even try what they just did is a really bad sign.
Edited 2020-06-17 16:28 for a typo in Bountysource's email address
Update added 2020-06-18 11:40 for BountySource's change of mind.


comment count unavailable comments

29 April 2020

Ian Jackson: subdirmk 1.0 - ergonomic preprocessing assistant for non-recursive make

I have made the 1.0 release of subdirmk. subdirmk is a tool to help with writing build systems in make, without use of recursive make. Why Peter Miller's 1997 essay Recursive Make Considered Harmful persuasively argues that it is better to arrange to have a single make invocation with the project's complete dependency tree, rather than the conventional $(MAKE) -C subdirectory approach. This has become much more relevant with modern projects which tend to be large and have deep directory trees. Invoking make separately for each of these subdirectories can be very slow. Nowadays everyone needs to run a parallel build, but with the recursive make approach great discipline is needed to avoid introducing races which cause the build to sometimes fail. There are various new systems which aim to replace make. My general impression of these is that they mostly threw away the good parts of make (often, they discard the flexibility, and the use of the shell command as the basic unit of execution, making them hard to extend), or make other unfortunate assumptions. And there are a lot of programming-language-specific systems - a very unsatisfactory development. Having said all that, I admit I haven't properly evaluated every make competitor. Other reasons for staying with make including that it is widely available, relatively widely understood, and has a model relatively free of high-level abstract concepts. (I like my languages with high-level concepts, but not my build systems.) But, with make, I found that actually writing a project's build system in non-recursive make was not very ergonomic. So with some help and prompting from Mark Wooding, I have made a tool to help. What subdirmk is a makefile preprocessor and aggregator, typically run from autoconf. subdirmk provides convenience syntaxes for references to per-directory variables and pathnames. It also helps by providing a little syntactic sugar for GNU make's macro facilities, which are awkward to use in raw make. subdirmk's features are triggered by the sigil &. The syntax is carefully designed to avoid getting in the way of makefile programming (and programming of shell commands in make rules). subdirmk is fully documented in the README. There is a demo in the example directory (which also serves as part of the test suite). What's new The version number. I have not felt the need to make any changes since releasing 0.4 in mid-February. The last non-docs change was a (backwards-compatible) extension, in late January, to pass through unaltered GNU make's new grouped multiple targets syntax. Advantages and disadvantages of subdirmk Compared to recursive make, subdirmk is easier and simpler, although you do have to decorate a lot of your variables and filenames with & to indicate that they are directory-local. It is much easier to avoid writing parallel make bugs. You naturally get properly working per-subdirectory targets. subdirmk-based nonrecursive make is much, much faster than recursive make. Compared to many other recent build system tools, subdirmk retains all the flexibility and extensibility of make, and operates at a fairly low level of abstraction. subdirmk-based makefiles can easily invoke other build systems. make knows it's not the only thing in the universe. You can adopt subdirmk incrementally or partially, gradually bringing your recursive submakefiles into the unified build. The build system code in subdirmk's Dir.sd.mk files will be readily navigable by most readers; much will be familiar. Because subdirmk is a small collection of (fairly simple) scripting and makefile code, there is no need to build it; you can simply ship it with your project using git-subtree. For an autoconf-based project, there need be no change to how your users and downstreams invoke your build. On the other hand the price you (continue to) pay is make's punctation soup, which subdirmk adds a new sigil to. subdirmk-based makefiles are terse and help you use make's facilities to abstract away repetition, but that can make them dense. The new & sigil will faze some readers. Currently, the provided mechanism for incorporating subdirmk into your project assumes you are using autoconf but not automake. It would be possible to use subdirmk with autoconf-less projects, or with automake-based ones, but I haven't done the glue work to make that easy. subdirmk does require GNU make and it assumes you have perl installed. But GNU make is very portable, and perl is very widely available. (The perl used is very conservative.) The make competitors are, themselves, even less standard build tools. I don't think a build-dependency on GNU make, or perl, is a significant barrier nowadays, for most projects. Note about comment moderation I have deliberately been vague about other build systems and avoided specific criticisms or references. I don't want the comments to become a build system advocacy debate. Comments may be screened and moderated accordingly. Pointers to other obscure build system tools are very welcome. If you want to write a survey of build tools, or a critique of subdirmk, please do so on your own blog; I would be happy to consider linking to it.

comment count unavailable comments

15 April 2020

Ian Jackson: Adapter to use camera tripod as a microphone stand

People complained that my laptop sound was buzzy, so I bought a proper microphone (thanks to mdw for advice, and loan of some kit). Proper microphones come with a holder that accepts a screw from your microphone stand. In my case, 3/8" "BSW" - similar to 3/8" US standard "coarse" (or if you unscrew a supplied insert, a special 27tpi 5/8" UNS). I don't have a microphone stand and I didn't want to buy one. I have two camera tripods - a small one and a big one. My camera tripods have 1/4" coarse (20tpi) UNC screws. Also, we have a 3D printer at home. And nowadays you can download a configurable screw thread from the internet. So I made myself an adapter. After a few iterations I have a pretty good object which can be used to fit my new microphone to either camera tripod. (I found that a UNC thread was close enough to fit the microphone's BSW.) Of coure, this is Open Hardware and the (short) source code is available. (To build it you'll want to git clone to get the included files.) Pictures below the cut. The flange at the bottom is because tripods usually have a soft top which is supposed to stop scratching the camera; a narrow hexagonal base would make gouges, hence the wide base. ( Read more... )

comment count unavailable comments

3 March 2020

Ian Jackson: Let's Encrypt certificate revocation - panic now!

Let's Encrypt have rather quietly announced (sadly, requires discourse JS!) that they are going to revoke a very large number of certificates. These revocations will start "no earlier than" 00:00 UTC tonight (24:00 on the 3rd of March), a little over 9h from now. Affected websites etc. may stop working. I discovered this at about lunchtime UK time today; two of my certs were affected. xenproject.org and linuxfoundation.org are listed as affected and I am trying to get in touch with the hosting provider to get it fixed. One of the domains we in the Xen Project run ourselves, with the help of the contractors who do much of our sysadmin, is affected - and those contractors (who are very competent) didn't know until I told them. tl;dr: If you are responsible for any Let's Encrypt certificates, check it right away and maybe panic now!
edited 2020-03-03 15:35 to fix arithmetic error

comment count unavailable comments

17 October 2017

Antoine Beaupr : A comparison of cryptographic keycards

An earlier article showed that private key storage is an important problem to solve in any cryptographic system and established keycards as a good way to store private key material offline. But which keycard should we use? This article examines the form factor, openness, and performance of four keycards to try to help readers choose the one that will fit their needs. I have personally been using a YubiKey NEO, since a 2015 announcement on GitHub promoting two-factor authentication. I was also able to hook up my SSH authentication key into the YubiKey's 2048 bit RSA slot. It seemed natural to move the other subkeys onto the keycard, provided that performance was sufficient. The mail client that I use, (Notmuch), blocks when decrypting messages, which could be a serious problems on large email threads from encrypted mailing lists. So I built a test harness and got access to some more keycards: I bought a FST-01 from its creator, Yutaka Niibe, at the last DebConf and Nitrokey donated a Nitrokey Pro. I also bought a YubiKey 4 when I got the NEO. There are of course other keycards out there, but those are the ones I could get my hands on. You'll notice none of those keycards have a physical keypad to enter passwords, so they are all vulnerable to keyloggers that could extract the key's PIN. Keep in mind, however, that even with the PIN, an attacker could only ask the keycard to decrypt or sign material but not extract the key that is protected by the card's firmware.

Form factor The Nitrokey Pro, YubiKey NEO (worn out), YubiKey 4, and FST-01 The four keycards have similar form factors: they all connect to a standard USB port, although both YubiKey keycards have a capacitive button by which the user triggers two-factor authentication and the YubiKey 4 can also require a button press to confirm private key use. The YubiKeys feel sturdier than the other two. The NEO has withstood two years of punishment in my pockets along with the rest of my "real" keyring and there is only minimal wear on the keycard in the picture. It's also thinner so it fits well on the keyring. The FST-01 stands out from the other two with its minimal design. Out of the box, the FST-01 comes without a case, so the circuitry is exposed. This is deliberate: one of its goals is to be as transparent as possible, both in terms of software and hardware design and you definitely get that feeling at the physical level. Unfortunately, that does mean it feels more brittle than other models: I wouldn't carry it in my pocket all the time, although there is a case that may protect the key a little better, but it does not provide an easy way to hook it into a keyring. In the group picture above, the FST-01 is the pink plastic thing, which is a rubbery casing I received along with the device when I got it. Notice how the USB connectors of the YubiKeys differ from the other two: while the FST-01 and the Nitrokey have standard USB connectors, the YubiKey has only a "half-connector", which is what makes it thinner than the other two. The "Nano" form factor takes this even further and almost disappears in the USB port. Unfortunately, this arrangement means the YubiKey NEO often comes loose and falls out of the USB port, especially when connected to a laptop. On my workstation, however, it usually stays put even with my whole keyring hanging off of it. I suspect this adds more strain to the host's USB port but that's a tradeoff I've lived with without any noticeable wear so far. Finally, the NEO has this peculiar feature of supporting NFC for certain operations, as LWN previously covered, but I haven't used that feature yet. The Nitrokey Pro looks like a normal USB key, in contrast with the other two devices. It does feel a little brittle when compared with the YubiKey, although only time will tell how much of a beating it can take. It has a small ring in the case so it is possible to carry it directly on your keyring, but I would be worried the cap would come off eventually. Nitrokey devices are also two times thicker than the Yubico models which makes them less convenient to carry around on keyrings.

Open and closed designs The FST-01 is as open as hardware comes, down to the PCB design available as KiCad files in this Git repository. The software running on the card is the Gnuk firmware that implements the OpenPGP card protocol, but you can also get it with firmware implementing a true random number generator (TRNG) called NeuG (pronounced "noisy"); the device is programmable through a standard Serial Wire Debug (SWD) port. The Nitrokey Start model also runs the Gnuk firmware. However, the Nitrokey website announces only ECC and RSA 2048-bit support for the Start, while the FST-01 also supports RSA-4096. Nitrokey's founder Jan Suhr, in a private email, explained that this is because "Gnuk doesn't support RSA-3072 or larger at a reasonable speed". Its devices (the Pro, Start, and HSM models) use a similar chip to the FST-01: the STM32F103 microcontroller. Nitrokey Pro with STM32F103TBU6 MCU Nitrokey also publishes its hardware designs, on GitHub, which shows the Pro is basically a fork of the FST-01, according to the ChangeLog. I opened the case to confirm it was using the STM MCU, something I should warn you against; I broke one of the pins holding it together when opening it so now it's even more fragile. But at least, I was able to confirm it was built using the STM32F103TBU6 MCU, like the FST-01. Nitrokey back side But this is where the comparison ends: on the back side, we find a SIM card reader that holds the OpenPGP card that, in turn, holds the private key material and does the cryptographic operations. So, in effect, the Nitrokey Pro is really a evolution of the original OpenPGP card readers. Nitrokey confirmed the OpenPGP card featured in the Pro is the same as the one shipped by the Free Software Foundation Europe (FSFE): the BasicCard built by ZeitControl. Those cards, however, are covered by NDAs and the firmware is only partially open source. This makes the Nitrokey Pro less open than the FST-01, but that's an inevitable tradeoff when choosing a design based on the OpenPGP cards, which Suhr described to me as "pretty proprietary". There are other keycards out there, however, for example the SLJ52GDL150-150k smartcard suggested by Debian developer Yves-Alexis Perez, which he prefers as it is certified by French and German authorities. In that blog post, he also said he was experimenting with the GPL-licensed OpenPGP applet implemented by the French ANSSI. But the YubiKey devices are even further away in the closed-design direction. Both the hardware designs and firmware are proprietary. The YubiKey NEO, for example, cannot be upgraded at all, even though it is based on an open firmware. According to Yubico's FAQ, this is due to "best security practices": "There is a 'no upgrade' policy for our devices since nothing, including malware, can write to the firmware." I find this decision questionable in a context where security updates are often more important than trying to design a bulletproof design, which may simply be impossible. And the YubiKey NEO did suffer from critical security issue that allowed attackers to bypass the PIN protection on the card, which raises the question of the actual protection of the private key material on those cards. According to Niibe, "some OpenPGP cards store the private key unencrypted. It is a common attitude for many smartcard implementations", which was confirmed by Suhr: "the private key is protected by hardware mechanisms which prevent its extraction and misuse". He is referring to the use of tamper resistance. After that security issue, there was no other option for YubiKey NEO users than to get a new keycard (for free, thankfully) from Yubico, which also meant discarding the private key material on the key. For OpenPGP keys, this may mean having to bootstrap the web of trust from scratch if the keycard was responsible for the main certification key. But at least the NEO is running free software based on the OpenPGP card applet and the source is still available on GitHub. The YubiKey 4, on the other hand, is now closed source, which was controversial when the new model was announced last year. It led the main Linux Foundation system administrator, Konstantin Ryabitsev, to withdraw his endorsement of Yubico products. In response, Yubico argued that this approach was essential to the security of its devices, which are now based on "a secure chip, which has built-in countermeasures to mitigate a long list of attacks". In particular, it claims that:
A commercial-grade AVR or ARM controller is unfit to be used in a security product. In most cases, these controllers are easy to attack, from breaking in via a debug/JTAG/TAP port to probing memory contents. Various forms of fault injection and side-channel analysis are possible, sometimes allowing for a complete key recovery in a shockingly short period of time.
While I understand those concerns, they eventually come down to the trust you have in an organization. Not only do we have to trust Yubico, but also hardware manufacturers and designs they have chosen. Every step in the hidden supply chain is then trusted to make correct technical decisions and not introduce any backdoors. History, unfortunately, is not on Yubico's side: Snowden revealed the example of RSA security accepting what renowned cryptographer Bruce Schneier described as a "bribe" from the NSA to weaken its ECC implementation, by using the presumably backdoored Dual_EC_DRBG algorithm. What makes Yubico or its suppliers so different from RSA Security? Remember that RSA Security used to be an adamant opponent to the degradation of encryption standards, campaigning against the Clipper chip in the first crypto wars. Even if we trust the Yubico supply chain, how can we trust a closed design using what basically amounts to security through obscurity? Publicly auditable designs are an important tradition in cryptography, and that principle shouldn't stop when software is frozen into silicon. In fact, a critical vulnerability called ROCA disclosed recently affects closed "smartcards" like the YubiKey 4 and allows full private key recovery from the public key if the key was generated on a vulnerable keycard. When speaking with Ars Technica, the researchers outlined the importance of open designs and questioned the reliability of certification:
Our work highlights the dangers of keeping the design secret and the implementation closed-source, even if both are thoroughly analyzed and certified by experts. The lack of public information causes a delay in the discovery of flaws (and hinders the process of checking for them), thereby increasing the number of already deployed and affected devices at the time of detection.
This issue with open hardware designs seems to be recurring topic of conversation on the Gnuk mailing list. For example, there was a discussion in September 2017 regarding possible hardware vulnerabilities in the STM MCU that would allow extraction of encrypted key material from the key. Niibe referred to a talk presented at the WOOT 17 workshop, where Johannes Obermaier and Stefan Tatschner, from the Fraunhofer Institute, demonstrated attacks against the STMF0 family MCUs. It is still unclear if those attacks also apply to the older STMF1 design used in the FST-01, however. Furthermore, extracted private key material is still protected by user passphrase, but the Gnuk uses a weak key derivation function, so brute-forcing attacks may be possible. Fortunately, there is work in progress to make GnuPG hash the passphrase before sending it to the keycard, which should make such attacks harder if not completely pointless. When asked about the Yubico claims in a private email, Niibe did recognize that "it is true that there are more weak points in general purpose implementations than special implementations". During the last DebConf in Montreal, Niibe explained:
If you don't trust me, you should not buy from me. Source code availability is only a single factor: someone can maliciously replace the firmware to enable advanced attacks.
Niibe recommends to "build the firmware yourself", also saying the design of the FST-01 uses normal hardware that "everyone can replicate". Those advantages are hard to deny for a cryptographic system: using more generic components makes it harder for hostile parties to mount targeted attacks. A counter-argument here is that it can be difficult for a regular user to audit such designs, let alone physically build the device from scratch but, in a mailing list discussion, Debian developer Ian Jackson explained that:
You don't need to be able to validate it personally. The thing spooks most hate is discovery. Backdooring supposedly-free hardware is harder (more costly) because it comes with greater risk of discovery. To put it concretely: if they backdoor all of them, someone (not necessarily you) might notice. (Backdooring only yours involves messing with the shipping arrangements and so on, and supposes that you specifically are of interest.)
Since that, as far as we know, the STM microcontrollers are not backdoored, I would tend to favor those devices instead of proprietary ones, as such a backdoor would be more easily detectable than in a closed design. Even though physical attacks may be possible against those microcontrollers, in the end, if an attacker has physical access to a keycard, I consider the key compromised, even if it has the best chip on the market. In our email exchange, Niibe argued that "when a token is lost, it is better to revoke keys, even if the token is considered secure enough". So like any other device, physical compromise of tokens may mean compromise of the key and should trigger key-revocation procedures.

Algorithms and performance To establish reliable performance results, I wrote a benchmark program naively called crypto-bench that could produce comparable results between the different keys. The program takes each algorithm/keycard combination and runs 1000 decryptions of a 16-byte file (one AES-128 block) using GnuPG, after priming it to get the password cached. I assume the overhead of GnuPG calls to be negligible, as it should be the same across all tokens, so comparisons are possible. AES encryption is constant across all tests as it is always performed on the host and fast enough to be irrelevant in the tests. I used the following:
  • Intel(R) Core(TM) i3-6100U CPU @ 2.30GHz running Debian 9 ("stretch"/stable amd64), using GnuPG 2.1.18-6 (from the stable Debian package)
  • Nitrokey Pro 0.8 (latest firmware)
  • FST-01, running Gnuk version 1.2.5 (latest firmware)
  • YubiKey NEO OpenPGP applet 1.0.10 (not upgradable)
  • YubiKey 4 4.2.6 (not upgradable)
I ran crypto-bench for each keycard, which resulted in the following:
Algorithm Device Mean time (s)
ECDH-Curve25519 CPU 0.036
FST-01 0.135
RSA-2048 CPU 0.016
YubiKey-4 0.162
Nitrokey-Pro 0.610
YubiKey-NEO 0.736
FST-01 1.265
RSA-4096 CPU 0.043
YubiKey-4 0.875
Nitrokey-Pro 3.150
FST-01 8.218
Decryption graph There we see the performance of the four keycards I tested, compared with the same operations done without a keycard: the "CPU" device. That provides the baseline time of GnuPG decrypting the file. The first obvious observation is that using a keycard is slower: in the best scenario (FST-01 + ECC) we see a four-fold slowdown, but in the worst case (also FST-01, but RSA-4096), we see a catastrophic 200-fold slowdown. When I presented the results on the Gnuk mailing list, GnuPG developer Werner Koch confirmed those "numbers are as expected":
With a crypto chip RSA is much faster. By design the Gnuk can't be as fast - it is just a simple MCU. However, using Curve25519 Gnuk is really fast.
And yes, the FST-01 is really fast at doing ECC, but it's also the only keycard that handles ECC in my tests; the Nitrokey Start and Nitrokey HSM should support it as well, but I haven't been able to test those devices. Also note that the YubiKey NEO doesn't support RSA-4096 at all, so we can only compare RSA-2048 across keycards. We should note, however, that ECC is slower than RSA on the CPU, which suggests the Gnuk ECC implementation used by the FST-01 is exceptionally fast. In discussions about improving the performance of the FST-01, Niibe estimated the user tolerance threshold to be "2 seconds decryption time". In a new design using the STM32L432 microcontroller, Aurelien Jarno was able to bring the numbers for RSA-2048 decryption from 1.27s down to 0.65s, and for RSA-4096, from 8.22s down to 3.87s seconds. RSA-4096 is still beyond the two-second threshold, but at least it brings the FST-01 close to the YubiKey NEO and Nitrokey Pro performance levels. We should also underline the superior performance of the YubiKey 4: whatever that thing is doing, it's doing it faster than anyone else. It does RSA-4096 faster than the FST-01 does RSA-2048, and almost as fast as the Nitrokey Pro does RSA-2048. We should also note that the Nitrokey Pro also fails to cross the two-second threshold for RSA-4096 decryption. For me, the FST-01's stellar performance with ECC outshines the other devices. Maybe it says more about the efficiency of the algorithm than the FST-01 or Gnuk's design, but it's definitely an interesting avenue for people who want to deploy those modern algorithms. So, in terms of performance, it is clear that both the YubiKey 4 and the FST-01 take the prize in their own areas (RSA and ECC, respectively).

Conclusion In the above presentation, I have evaluated four cryptographic keycards for use with various OpenPGP operations. What the results show is that the only efficient way of storing a 4096-bit encryption key on a keycard would be to use the YubiKey 4. Unfortunately, I do not feel we should put our trust in such closed designs so I would argue you should either stick with 2048-bit encryption subkeys or keep the keys on disk. Considering that losing such a key would be catastrophic, this might be a good approach anyway. You should also consider switching to ECC encryption: even though it may not be supported everywhere, GnuPG supports having multiple encryption subkeys on a keyring: if one algorithm is unsupported (e.g. GnuPG 1.4 doesn't support ECC), it will fall back to a supported algorithm (e.g. RSA). Do not forget your previously encrypted material doesn't magically re-encrypt itself using your new encryption subkey, however. For authentication and signing keys, speed is not such an issue, so I would warmly recommend either the Nitrokey Pro or Start, or the FST-01, depending on whether you want to start experimenting with ECC algorithms. Availability also seems to be an issue for the FST-01. While you can generally get the device when you meet Niibe in person for a few bucks (I bought mine for around \$30 Canadian), the Seeed online shop says the device is out of stock at the time of this writing, even though Jonathan McDowell said that may be inaccurate in a debian-project discussion. Nevertheless, this issue may make the Nitrokey devices more attractive. When deciding on using the Pro or Start, Suhr offered the following advice:
In practice smart card security has been proven to work well (at least if you use a decent smart card). Therefore the Nitrokey Pro should be used for high security cases. If you don't trust the smart card or if Nitrokey Start is just sufficient for you, you can choose that one. This is why we offer both models.
So far, I have created a signing subkey and moved that and my authentication key to the YubiKey NEO, because it's a device I physically trust to keep itself together in my pockets and I was already using it. It has served me well so far, especially with its extra features like U2F and HOTP support, which I use frequently. Those features are also available on the Nitrokey Pro, so that may be an alternative if I lose the YubiKey. I will probably move my main certification key to the FST-01 and a LUKS-encrypted USB disk, to keep that certification key offline but backed up on two different devices. As for the encryption key, I'll wait for keycard performance to improve, or simply switch my whole keyring to ECC and use the FST-01 or Nitrokey Start for that purpose.
[The author would like to thank Nitrokey for providing hardware for testing.] This article first appeared in the Linux Weekly News.

25 September 2017

Chris Lamb: Lintian: We are all Perl developers now

Lintian is a static analysis tool for Debian packages, reporting on various errors, omissions and general quality-assurance issues to maintainers. I've previously written about my exploits with Lintian as well as authoring a short tutorial on how to write your own Lintian check. Anyway, I recently uploaded version 2.5.53 about two months since previous release. The biggest changes you may notice are supporting the latest version of the Debian Policy as well the addition of checks to encourage the migration to Python 3. Thanks to all who contributed patches, code review and bug reports to this release. The full changelog is as follows:
lintian (2.5.53) unstable; urgency=medium
  The "we are all Perl developers now" release.
  * Summary of tag changes:
    + Added:
      - alternatively-build-depends-on-python-sphinx-and-python3-sphinx
      - build-depends-on-python-sphinx-only
      - dependency-on-python-version-marked-for-end-of-life
      - maintainer-script-interpreter
      - missing-call-to-dpkg-maintscript-helper
      - node-package-install-in-nodejs-rootdir
      - override-file-in-wrong-package
      - package-installs-java-bytecode
      - python-foo-but-no-python3-foo
      - script-needs-depends-on-sensible-utils
      - script-uses-deprecated-nodejs-location
      - transitional-package-should-be-oldlibs-optional
      - unnecessary-testsuite-autopkgtest-header
      - vcs-browser-links-to-empty-view
    + Removed:
      - debug-package-should-be-priority-extra
      - missing-classpath
      - transitional-package-should-be-oldlibs-extra
  * checks/apache2.pm:
    + [CL] Fix an apache2-unparsable-dependency false positive by allowing
      periods (".") in dependency names.  (Closes: #873701)
  * checks/binaries.pm:
    + [CL] Apply patches from Guillem Jover & Boud Roukema to improve the
      description of the binary-file-built-without-LFS-support tag.
      (Closes: #874078)
  * checks/changes. pm,desc :
    + [CL] Ignore DFSG-repacked packages when checking for upstream
      source tarball signatures as they will never match by definition.
      (Closes: #871957)
    + [CL] Downgrade severity of orig-tarball-missing-upstream-signature
      from "E:" to "W:" as many common tools do not make including the
      signatures easy enough right now.  (Closes: #870722, #870069)
    + [CL] Expand the explanation of the
      orig-tarball-missing-upstream-signature tag to include the location
      of where dpkg-source will look. Thanks to Theodore Ts'o for the
      suggestion.
  * checks/copyright-file.pm:
    + [CL] Address a number of issues in copyright-year-in-future:
      - Prevent false positives in port numbers, email addresses, ISO
        standard numbers and matching specific and general street
        addresses.  (Closes: #869788)
      - Match all violating years in a line, not just the first (eg.
        "2000-2107").
      - Ignore meta copyright statements such as "Original Author". Thanks
        to Thorsten Alteholz for the bug report.  (Closes: #873323)
      - Expand testsuite.
  * checks/cruft. pm,desc :
    + [CL] Downgrade severity of file-contains-fixme-placeholder
      tag from "important" (ie. "E:") to "wishlist" (ie. "I:").
      Thanks to Gregor Herrmann for the suggestion.
    + [CL] Apply patch from Alex Muntada (alexm) to use "substr" instead
      of "substring" in mentions-deprecated-usr-lib-perl5-directory's
      description.  (Closes: #871767)
    + [CL] Don't check copyright_hints file for FIXME placeholders.
      (Closes: #872843)
    + [CL] Don't match quoted "FIXME" variants as they are almost always
      deliberate. Thanks to Adrian Bunk for the report.  (Closes: #870199)
    + [CL] Avoid false positives in missing source checks for "CSS Browser
      Selector".  (Closes: #874381)
  * checks/debhelper.pm:
    + [CL] Prevent a false positive of
      missing-build-dependency-for-dh_-command that can be exposed by
      following the advice for the recently added
      useless-autoreconf-build-depends tag.  (Closes: #869541)
  * checks/debian-readme. pm,desc :
    + [CL] Ensure readme-debian-contains-debmake-template also checks
      for templates "Automatically generated by debmake".
  * checks/description. desc,pm :
    + [CL] Clarify explanation of description-starts-with-leading-spaces
      tag. Thanks to Taylor Kline  for the report
      and patch.  (Closes: #849622)
    + [NT] Skip capitalization-error-in-description-synopsis for
      auto-generated packages (such as dbgsym packages).
  * checks/fields. desc,pm :
    + [CL] Ensure that python3-foo packages have "Section: python", not
      just python2-foo.  (Closes: #870272)
    + [RG] Do no longer require debug packages to be priority extra.
    + [BR] Use Lintian::Data for name/section mapping
    + [CL] Check for packages including "?rev=0&sc=0" in Vcs-Browser.
      (Closes: #681713)
    + [NT] Transitional packages should now be "oldlibs/optional" rather
      than "oldlibs/extra".  The related tag has been renamed accordingly.
  * checks/filename-length.pm:
    + [NT] Skip the check on auto-generated binary packages (such as
      dbgsym packages).
  * checks/files. pm,desc :
    + [BR] Avoid privacy-breach-generic false positives for legal.xml.
    + [BR] Detect install of node package under /usr/lib/nodejs/[^/]*$
    + [CL] Check for packages shipping compiled Java class files. Thanks
      Carn  Draug .  (Closes: #873211)
    + [BR] Privacy breach is no longer experimental.
  * checks/init.d.desc:
    + [RG] Do not recommend a versioned dependency on lsb-base in
      init.d-script-needs-depends-on-lsb-base.  (Closes: #847144)
  * checks/java.pm:
    + [CL] Additionally consider .cljc files as code to avoid false-
      positive codeless-jar warnings.  (Closes: #870649)
    + [CL] Drop problematic missing-classpath check.  (Closes: #857123)
  * checks/menu-format.desc:
    + [CL] Prevent false positives in desktop-entry-lacks-keywords-entry
      for "Link" and "Directory" .desktop files.  (Closes: #873702)
  * checks/python. pm,desc :
    + [CL] Split out Python checks from "scripts" check to a new, source
      check of type "source".
    + [CL] Check for python-foo without corresponding python3-foo packages
      to assist in Python 2.x deprecation.  (Closes: #870681)
    + [CL] Check for packages that Build-Depend on python-sphinx only.
      (Closes: #870730)
    + [CL] Check for packages that alternatively Build-Depend on the
      Python 2 and Python 3 versions of Sphinx.  (Closes: #870758)
    + [CL] Check for binary packages that depend on Python 2.x.
      (Closes: #870822)
  * checks/scripts.pm:
    + [CL] Correct false positives in
      unconditional-use-of-dpkg-statoverride by detecting "if !" as a
      valid shell prefix.  (Closes: #869587)
    + [CL] Check for missing calls to dpkg-maintscript-helper(1) in
      maintainer scripts.  (Closes: #872042)
    + [CL] Check for packages using sensible-utils without declaring a
      dependency after its split from debianutils.  (Closes: #872611)
    + [CL] Warn about scripts using "nodejs" as an interpreter now that
      nodejs provides /usr/bin/node.  (Closes: #873096)
    + [BR] Add a statistic tag giving interpreter.
  * checks/testsuite. desc,pm :
    + [CL] Remove recommendations to add a "Testsuite: autopkgtest" field
      to debian/control as it is added when needed by dpkg-source(1)
      since dpkg 1.17.1.  (Closes: #865531)
    + [CL] Warn if we see an unnecessary "Testsuite: autopkgtest" header
      in debian/control.
    + [NT] Recognise "autopkgtest-pkg-go" as a valid test suite.
    + [CL] Recognise "autopkgtest-pkg-elpa" as a valid test suite.
      (Closes: #873458)
    + [CL] Recognise "autopkgtest-pkg-octave" as a valid test suite.
      (Closes: #875985)
    + [CL] Update the description of unknown-testsuite to reflect that
      "autopkgtest" is not the only valid value; the referenced URL
      is out-of-date (filed as #876008).  (Closes: #876003)
  * data/binaries/embedded-libs:
    + [RG] Detect embedded copies of heimdal, libgxps, libquicktime,
      libsass, libytnef, and taglib.
    + [RG] Use an additional string to detect embedded copies of
      openjpeg2.  (Closes: #762956)
  * data/fields/name_section_mappings:
    + [BR] node- package section is javascript.
    + [CL] Apply patch from Guillem Jover to add more section mappings.
      (Closes: #874121)
  * data/fields/obsolete-packages:
    + [MR] Add dh-systemd.  (Closes: #872076)
  * data/fields/perl-provides:
    + [CL] Refresh perl provides.
  * data/fields/virtual-packages:
    + [CL] Update data file from archive. This fixes a false positive for
      "bacula-director".  (Closes: #835120)
  * data/files/obsolete-paths:
    + [CL] Add note to /etc/bash_completion.d entry regarding stricter
      filename requirements.  (Closes: #814599)
  * data/files/privacy-breaker-websites:
    + [BR] Detect custom donation logos like apache.
    + [BR] Detect generic counter website.
  * data/standards-version/release-dates:
    + [CL] Add 4.0.1 and 4.1.0 as known standards versions.
      (Closes: #875509)
  * debian/control:
    + [CL] Mention Debian Policy v4.1.0 in the description.
    + [CL] Add myself to Uploaders.
    + [CL] Drop unnecessary "Testsuite: autopkgtest"; this is implied from
      debian/tests/control existing.
  * commands/info.pm:
    + [CL] Add a --list-tags option to print all tags Lintian knows about.
      Thanks to Rajendra Gokhale for the suggestion.  (Closes: #779675)
  * commands/lintian.pm:
    + [CL] Apply patch from Maia Everett to avoid British spelling when
      using en_US locale.  (Closes: #868897)
  * lib/Lintian/Check.pm:
    + [CL] Stop emitting  maintainer,uploader -address-causes-mail-loops
      for @packages.debian.org addresses.  (Closes: #871575)
  * lib/Lintian/Collect/Binary.pm:
    + [NT] Introduce an "auto-generated" argument for "is_pkg_class".
  * lib/Lintian/Data.pm:
    + [CL] Modify Lintian::Data's "all" to always return keys in insertion
      order, dropping dependency on libtie-ixhash-perl.
  * helpers/coll/objdump-info-helper:
    + [CL] Apply patch from Steve Langasek to accommodate binutils 2.29
      outputting symbols in a different format on ppc64el.
      (Closes: #869750)
  * t/tests/fields-perl-provides/tags:
    + [CL] Update expected output to match new Perl provides.
  * t/tests/files-privacybreach/*:
    + [CL] Add explicit test for packages including external fonts via
      the Google Font API. Thanks to Ian Jackson for the report.
      (Closes: #873434)
    + [CL] Add explicit test for packages including external fonts via
      the Typekit API via <script/> HTML tags.
  * t/tests/*/desc:
    + [CL] Add missing entries in "Test-For" fields to make
      development/testing workflow less error-prone.
  * private/generate-tag-summary:
    + [CL] git-describe(1) will usually emit 7 hexadecimal digits as the
      abbreviated object name,  However, as this can be user-dependent,
      pass --abbrev=0 to ensure it does not vary between systems.  This
      also means we do not need to strip it ourselves.
  * private/refresh-*:
    + [CL] Use deb.debian.org as the default mirror.
    + [CL] Update locations of Contents-<arch> files; they are now
      namespaced by distribution (eg. "main").
 -- Chris Lamb <lamby@debian.org>  Wed, 20 Sep 2017 09:25:06 +0100

29 August 2017

Colin Watson: env chdir

I was recently asked to sort things out so that snap builds on Launchpad could themselves install snaps as build-dependencies. To make this work we need to start doing builds in LXD containers rather than in chroots. As a result I ve been doing some quite extensive refactoring of launchpad-buildd: it previously had the assumption that it was going to use a chroot for everything baked into lots of untested helper shell scripts, and I ve been rewriting those in Python with unit tests and with a single Backend abstraction that isolates the high-level logic from the details of where each build is being performed. This is all interesting work in its own right, but it s not what I want to talk about here. While I was doing all this refactoring, I ran across a couple of methods I wrote a while back which looked something like this:
def chroot(self, args, echo=False):
    """Run a command in the chroot.
    :param args: the command and arguments to run.
    """
    args = set_personality(
        args, self.options.arch, series=self.options.series)
    if echo:
        print("Running in chroot: %s" %
              ' '.join("'%s'" % arg for arg in args))
        sys.stdout.flush()
    subprocess.check_call([
        "/usr/bin/sudo", "/usr/sbin/chroot", self.chroot_path] + args)
def run_build_command(self, args, env=None, echo=False):
    """Run a build command in the chroot.
    This is unpleasant because we need to run it in /build under sudo
    chroot, and there's no way to do this without either a helper
    program in the chroot or unpleasant quoting.  We go for the
    unpleasant quoting.
    :param args: the command and arguments to run.
    :param env: dictionary of additional environment variables to set.
    """
    args = [shell_escape(arg) for arg in args]
    if env:
        args = ["env"] + [
            "%s=%s" % (key, shell_escape(value))
            for key, value in env.items()] + args
    command = "cd /build && %s" % " ".join(args)
    self.chroot(["/bin/sh", "-c", command], echo=echo)
(I ve already replaced the chroot method with a call to Backend.run, but it s easier to see what I m talking about in the original form.) One thing to notice about this code is that it uses several adverbial commands: that is, commands that run another command in a different way. For example, sudo runs another command as another user, while chroot runs another command with a different root directory, and env runs another command with different environment variables set. These commands chain neatly, and they also have the useful property that they take the subsidiary command and its arguments as a list of arguments. coreutils has several other commands that behave this way, and adverbio is another useful example. By contrast, su -c is something you might call a quasi-adverbial command: it does modify the behaviour of another command, but it takes it as a single argument which it then passes to sh -c. Every time you have something that s passed to a shell like this, you need a corresponding layer of shell quoting to escape any shell metacharacters that should be interpreted literally. This is often cumbersome and is easy to get wrong. My Python implementation is as follows, and I wouldn t be totally surprised to discover that it contained a bug:
import re
non_meta_re = re.compile(r'^[a-zA-Z0-9+,./:=@_-]+$')
def shell_escape(arg):
    if non_meta_re.match(arg):
        return arg
    else:
        return "'%s'" % arg.replace("'", "'\\''")
Python >= 3.3 has shlex.quote, which is an improvement and we should probably use that instead, but it s still another thing to forget to call. This is why process-spawning libraries such as Python s subprocess, Perl s system and open, and my own libpipeline for C encourage programmers to use a list syntax and to avoid involving the shell entirely wherever possible. One thing that the standard Unix tools don t let you do in an adverbial way is to change your working directory, and I ve run into this annoying limitation several times. This means that it s difficult to chain that operation together with other adverbs, for example to run a command in a particular working directory inside a chroot. The workaround I used above was to invoke a shell that runs cd /build && ..., but that s another command that s only quasi-adverbial, since the extra shell means an extra layer of shell quoting. (Ian Jackson rightly observes that you can in fact write the necessary adverb as something like sh -ec 'cd "$1"; shift; exec "$@"' chdir. I think that s a bit uglier than I ideally want to use in production code, but you might reasonably think that it s worth it to avoid the extra layer of shell quoting.) I therefore decided that this was a feature that belonged in coreutils, and after a bit of mailing list discussion we felt it was best implemented as a new option to env(1). I sent a patch for this which has been accepted. This means that we have a new composable adverb, env --chdir=NEWDIR, which will allow the run_build_command method above to be rewritten as something like this:
def run_build_command(self, args, env=None, echo=False):
    """Run a build command in the chroot.
    :param args: the command and arguments to run.
    :param env: dictionary of additional environment variables to set.
    """
    env_args = ["env", "--chdir=/build"]
    if env:
        for key, value in env.items():
            env_args.append("%s=%s" % (key, value))
    self.chroot(env_args + args, echo=echo)
The env --chdir option will be in coreutils 8.28. We won t be able to use it in launchpad-buildd until that s available in all Ubuntu series we might want to build for, so in this particular application that s going to take a few years; but other applications may well be able to make use of it sooner.

Next.

Previous.